Monitoring Process Change with Bayesian Methods

Insurely You’re Joking

Mick Cooney mickcooney@gmail.com

2017-08-21

Introduction

Structure of Talk


  • Discussion of Problem
  • Bayesian analysis and the Beta distribution
  • Adding layers of noise
  • Distribution distances and f-divergences

Problem Discussion


Not Change-point analysis


Measure change effect


Signal vs noise

Want generic technique

Sales-call Conversions


Binary outcome (0 or 1)


Monthly summaries


Sales due to faster turnaround

Bayesian Analysis

Bayes Rule


\[ P(A | B) = \frac{P(B|A) P(A)}{P(B)} \]

Continuous Form


\[ p(\theta | D) \propto \int p(D | \theta) \, p(\theta) \ d\theta \]


where


\[\begin{eqnarray*} p(\theta) &=& \text{Prior distribution for $\theta$} \\ p(D | \theta) &=& \text{Probability of seeing data $D$ given value $\theta$} \\ p(\theta | D) &=& \text{Posterior distribution for $\theta$} \end{eqnarray*}\]

Binomial Likelihood


Single trial:

\[ p(y|\theta) = \theta^y (1 - \theta)^{1-y} \]

\(n\) trials, \(k\) successes:

\[ p(k | \theta) = \binom{n}{k} \, \theta^k (1 - \theta)^{n-k} \]

Beta Distribution


\[ p(\theta) = Beta(\alpha, \beta) \]


\[ p(\theta | D) = Beta(\alpha + k, \beta + n - k) \]

Attacking the Problem

First Attempt


Generate data


Calculate yearly posterior distributions


Graph it

Randomise Monthly Calls


Had 500 calls per month


Treat as Poisson process


\[ C \sim Pois(500) \]

Stochastic Conversion Rate


Add noise to the underlying rate?


More Simple Approach


Aggregate yearly


Look at yearly conversions

Why the discrepancy in outputs?

Prior Data

Strength of Prior


Quantity of data accumulates


Prior very strong

Need to rethink priors

Prior represents knowledge


How confident are we?

Constructing Priors


Balancing act


Estimate \(\theta\), assign a strength

Reparameterise \(Beta(\alpha, \beta)\)


\[ Beta(\alpha, \beta) \rightarrow Beta(\mu K, (1 - \mu) K) \]


\[\begin{eqnarray*} \mu &=& \text{probability expectation} \\ K &=& \text{strength of belief} \end{eqnarray*}\]
stoc_count_tbl %>%
    filter(rate_date < as.Date('2016-01-01')) %>%
    summarise(conv_count = sum(conversion_count)
             ,call_count = sum(call_count)
             ,rate       = conv_count / call_count
              )
## # A tibble: 1 x 3
##   conv_count call_count      rate
##        <int>      <int>     <dbl>
## 1       3477      35934 0.0967607

Assume 1 year of ‘strength’


\(K = 12 \times 500 = 6,000\)


\[\begin{eqnarray*} \mu &=& 0.0967607 \\ K &=& 6,000 \end{eqnarray*}\]

Moving Signal/Noise Ratio


\[ \mathcal{N}(0.10, 0.02) \rightarrow \mathcal{N}(0.15, 0.02) \]


Mean-shift well outside variance

Increase Noise


\[ \mathcal{N}(0.40, 0.08) \rightarrow \mathcal{N}(0.45, 0.08) \]


Can we see difference?

Very hard to spot a change!

Analysis for \(\mu = 0.40\)

How can we quantify differences?

f-divergences

Distributional Differences


A metric or distance:

\[ d : X \times X \rightarrow \mathbb{R}^{+} \]
\[\begin{align*} d(x, y) &\geq 0 \; \forall x, y \in X, && \text{ non-negativity} \\ d(x, y) &= 0 \; \iff \; x = y \; \forall x, y \in X, && \text{ identity of indiscernables} \\ d(x, y) &= d(y, x) \; \forall x, y \in X, && \text{ symmetry} \\ d(x, z) &\leq d(x, y) + d(y, z) \; \forall x, y, z \in X, && \text{ triangle inequality} \\ \end{align*}\]

Common-Area Metric


\[ D(P, Q) = \int^1_0 \text{min}(P(x), Q(x)) \, dx \]

Kullback-Leibler Divergence


\[ D_{KL}(P||Q) = \int^1_0 p(x) \ln \frac{p(x)}{q(x)} \, dx \]


Not symmetric


No triangle inequality


Intuitive information theory interpretation

Hellinger Distance


\[ H^2(P, Q) = 1 - \int \sqrt{p(x) q(x)} \, dx \]

\[ 0 \leq H(P, Q) \leq 1 \]

\[ H^2(P, Q) \leq \delta(P, Q) \leq \sqrt{2} H(P, Q) \]

f-div Values for Beta Distribution


\[ \mu = 0.10 \;\; K_1 = 6,000 \;\; K_2 = 7,000 \;\; K_3 = 12,000 \]

calculate_metrics(x_seq, Beta1, Beta1) %>% print(digits = 2)
## commonarea  hellinger         kl 
##    4.4e-16    4.4e-16    0.0e+00
calculate_metrics(x_seq, Beta1, Beta2) %>% print(digits = 2)
## commonarea  hellinger         kl 
##     0.0373     0.0015     0.0063
calculate_metrics(x_seq, Beta1, Beta3) %>% print(digits = 2)
## commonarea  hellinger         kl 
##      0.166      0.029      0.153

Construct Ideal Data


Fix \(\mu_1\)


Have 1 year of data as prior, \(K_1 = 6,000\)


Set new \(\mu_2\)


Check distribution:


Two months, \(K_2 = 7,000\); one year, \(K_3 = 12,000\)

Small Move: \(\mu_1 = 0.10 \;\; \mu_2 = 0.11\)

calculate_metrics(x_seq, Beta1, Beta1) %>% print(digits = 4)
## commonarea  hellinger         kl 
##  4.441e-16  4.441e-16  0.000e+00
calculate_metrics(x_seq, Beta1, Beta2) %>% print(digits = 4)
## commonarea  hellinger         kl 
##    0.15522    0.01974    0.08640
calculate_metrics(x_seq, Beta1, Beta3) %>% print(digits = 4)
## commonarea  hellinger         kl 
##     0.5592     0.2624     1.8189

Larger Move: \(\mu_1 = 0.10 \;\; \mu_2 = 0.15\)

calculate_metrics(x_seq, Beta1, Beta1) %>% print(digits = 4)
## commonarea  hellinger         kl 
##  4.441e-16  4.441e-16  0.000e+00
calculate_metrics(x_seq, Beta1, Beta2) %>% print(digits = 4)
## commonarea  hellinger         kl 
##     0.6553     0.3602     1.9564
calculate_metrics(x_seq, Beta1, Beta3) %>% print(digits = 4)
## commonarea  hellinger         kl 
##     0.9997     0.9981    39.1994

What about our data?

Analyse Conversion Data


We have monthly call data


Have posterior distributions


Calculate metrics as data updates

Low-Noise Move: \(\mu = 0.10 \rightarrow 0.15\)

High-Noise Move: \(\mu = 0.40 \rightarrow 0.45\)

Conclusion

Summary


Binomial process with known change point


Model with Beta distribution


Aggregate data appropriately


Distribution plots and f-divergence metrics


Decide on thresholds

Future Extensions


Try with other processes / distributions


More comprehensive behaviour investigation


Look at statistical distance


Time-series methods

Questions?


mickcooney@gmail.com


https://github.com/kaybenleroll/dublin_r_workshops


Blog post:


http://blog.applied.ai/a-bayesian-approach-to-monitoring-process-change/